ql algorithm
Rank-One Modified Value Iteration
Kolarijani, Arman Sharifi, Ok, Tolga, Esfahani, Peyman Mohajerin, Kolarijani, Mohamad Amin Sharif
In this paper, we provide a novel algorithm for solving planning and learning problems of Markov decision processes. The proposed algorithm follows a policy iteration-type update by using a rank-one approximation of the transition probability matrix in the policy evaluation step. This rank-one approximation is closely related to the stationary distribution of the corresponding transition probability matrix, which is approximated using the power method. We provide theoretical guarantees for the convergence of the proposed algorithm to optimal (action-)value function with the same rate and computational complexity as the value iteration algorithm in the planning problem and as the Q-learning algorithm in the learning problem. Through our extensive numerical simulations, however, we show that the proposed algorithm consistently outperforms first-order algorithms and their accelerated versions for both planning and learning problems.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)
Multi-Agent Q-Learning for Real-Time Load Balancing User Association and Handover in Mobile Networks
Alizadeh, Alireza, Lim, Byungju, Vu, Mai
As next generation cellular networks become denser, associating users with the optimal base stations at each time while ensuring no base station is overloaded becomes critical for achieving stable and high network performance. We propose multi-agent online Q-learning (QL) algorithms for performing real-time load balancing user association and handover in dense cellular networks. The load balancing constraints at all base stations couple the actions of user agents, and we propose two multi-agent action selection policies, one centralized and one distributed, to satisfy load balancing at every learning step. In the centralized policy, the actions of UEs are determined by a central load balancer (CLB) running an algorithm based on swapping the worst connection to maximize the total learning reward. In the distributed policy, each UE takes an action based on its local information by participating in a distributed matching game with the BSs to maximize the local reward. We then integrate these action selection policies into an online QL algorithm that adapts in real-time to network dynamics including channel variations and user mobility, using a reward function that considers a handover cost to reduce handover frequency. The proposed multi-agent QL algorithm features low-complexity and fast convergence, outperforming 3GPP max-SINR association. Both policies adapt well to network dynamics at various UE speed profiles from walking, running, to biking and suburban driving, illustrating their robustness and real-time adaptability.
- North America > United States (0.28)
- Asia > South Korea > Busan > Busan (0.04)
- Asia > Middle East > Yemen > Amanat Al Asimah > Sanaa (0.04)
- Research Report (0.64)
- Workflow (0.46)
- Telecommunications (1.00)
- Energy > Power Industry (1.00)
- Leisure & Entertainment > Games > Computer Games (0.75)
Q-learning Based Optimal False Data Injection Attack on Probabilistic Boolean Control Networks
Peng, Xianlun, Tang, Yang, Li, Fangfei, Liu, Yang
In this paper, we present a reinforcement learning (RL) method for solving optimal false data injection attack problems in probabilistic Boolean control networks (PBCNs) where the attacker lacks knowledge of the system model. Specifically, we employ a Q-learning (QL) algorithm to address this problem. We then propose an improved QL algorithm that not only enhances learning efficiency but also obtains optimal attack strategies for large-scale PBCNs that the standard QL algorithm cannot handle. Finally, we verify the effectiveness of our proposed approach by considering two attacked PBCNs, including a 10-node network and a 28-node network.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > Cuba (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)